{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lesson 12 - Pandas Part II\n",
    "\n",
    "In this lesson we will cover some more advanced features of [Pandas](http://pandas.pydata.org).\n",
    "\n",
    "### Readings\n",
    "\n",
    "* McKinney: [Chapter 6. Data Loading, Storage, and File Formats](http://proquest.safaribooksonline.com/book/programming/python/9781491957653/data-loading-storage-and-file-formats/io_html)\n",
    "* McKinney: [Chapter 7. Data Cleaning and Preparation](http://proquest.safaribooksonline.com/book/programming/python/9781491957653/data-cleaning-and-preparation/data_preparation_html)\n",
    "* McKinney: [Chapter 8. Data Wrangling: Join, Combine, and Reshape](http://proquest.safaribooksonline.com/book/programming/python/9781491957653/data-wrangling-join-combine-and-reshape/wrangling_html)\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "* [concat](#concat)\n",
    "* [append](#append)\n",
    "* [merge](#merge)\n",
    "* [join](#join)\n",
    "* [set_option](#set_option)\n",
    "* [stack](#stack)\n",
    "* [unstack](#unstack)\n",
    "* [values](#values)\n",
    "* [apply](#apply)\n",
    "* [lambda](#lambda)\n",
    "* [map](#map)\n",
    "* [sort_index](#sort_index)\n",
    "* [sort_values](#sort_values)\n",
    "* [isnull](#isnull)\n",
    "* [fillna](#fillna)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import modules\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"concat\"></a>\n",
    "\n",
    "### Concatenating (appending) and merging (joining) DataFrames\n",
    "See [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/merging.html) for more information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This table describes how the funcitons were are going to learn about are related to each other:\n",
    "\n",
    "Action        | Combine two          | Add one to another\n",
    "--------------|----------------------|-------------------\n",
    "Concatenating | pd.concat([df1, df2]) | df1.append(df2)\n",
    "Merging       | pd.merge(df1, df2)   | df1.join(df2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### concat\n",
    "\n",
    "    pd.concat(objs, axis=0, join='outer', join_axes=None, \n",
    "        ignore_index=False, keys=None, levels=None, names=None, \n",
    "        verify_integrity=False)\n",
    "       \n",
    "* objs: list or dict of Series, DataFrame, or Panel objects. If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below)\n",
    "* axis: {0, 1, ...}, default 0. The axis to concatenate along\n",
    "* join: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection\n",
    "* join_axes: list of Index objects. Specific indexes to use for the other n - 1 axes instead of performing inner/outer set logic\n",
    "* keys: sequence, default None. Construct hierarchical index using the passed keys as the outermost level If multiple levels passed, should contain tuples.\n",
    "* levels : list of sequences, default None. If keys passed, specific levels to use for the resulting MultiIndex. Otherwise they will be inferred from the keys\n",
    "* names: list, default None. Names for the levels in the resulting hierarchical index\n",
    "* verify_integrity: boolean, default False. Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation\n",
    "* ignore_index : boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n",
    "                    'B': ['B0', 'B1', 'B2', 'B3'],\n",
    "                    'C': ['C0', 'C1', 'C2', 'C3'],\n",
    "                    'D': ['D0', 'D1', 'D2', 'D3']},\n",
    "                    index=[0, 1, 2, 3])\n",
    "df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n",
    "                    'B': ['B4', 'B5', 'B6', 'B7'],\n",
    "                    'C': ['C4', 'C5', 'C6', 'C7'],\n",
    "                    'D': ['D4', 'D5', 'D6', 'D7']},\n",
    "                    index=[4, 5, 6, 7]) \n",
    "df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n",
    "                    'B': ['B8', 'B9', 'B10', 'B11'],\n",
    "                    'C': ['C8', 'C9', 'C10', 'C11'],\n",
    "                    'D': ['D8', 'D9', 'D10', 'D11']},\n",
    "                    index=[8, 9, 10, 11])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "0  A0  B0  C0  D0\n",
       "1  A1  B1  C1  D1\n",
       "2  A2  B2  C2  D2\n",
       "3  A3  B3  C3  D3"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "4  A4  B4  C4  D4\n",
       "5  A5  B5  C5  D5\n",
       "6  A6  B6  C6  D6\n",
       "7  A7  B7  C7  D7"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>A8</td>\n",
       "      <td>B8</td>\n",
       "      <td>C8</td>\n",
       "      <td>D8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>A9</td>\n",
       "      <td>B9</td>\n",
       "      <td>C9</td>\n",
       "      <td>D9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>A10</td>\n",
       "      <td>B10</td>\n",
       "      <td>C10</td>\n",
       "      <td>D10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>A11</td>\n",
       "      <td>B11</td>\n",
       "      <td>C11</td>\n",
       "      <td>D11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B    C    D\n",
       "8    A8   B8   C8   D8\n",
       "9    A9   B9   C9   D9\n",
       "10  A10  B10  C10  D10\n",
       "11  A11  B11  C11  D11"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>A8</td>\n",
       "      <td>B8</td>\n",
       "      <td>C8</td>\n",
       "      <td>D8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>A9</td>\n",
       "      <td>B9</td>\n",
       "      <td>C9</td>\n",
       "      <td>D9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>A10</td>\n",
       "      <td>B10</td>\n",
       "      <td>C10</td>\n",
       "      <td>D10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>A11</td>\n",
       "      <td>B11</td>\n",
       "      <td>C11</td>\n",
       "      <td>D11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B    C    D\n",
       "0    A0   B0   C0   D0\n",
       "1    A1   B1   C1   D1\n",
       "2    A2   B2   C2   D2\n",
       "3    A3   B3   C3   D3\n",
       "4    A4   B4   C4   D4\n",
       "5    A5   B5   C5   D5\n",
       "6    A6   B6   C6   D6\n",
       "7    A7   B7   C7   D7\n",
       "8    A8   B8   C8   D8\n",
       "9    A9   B9   C9   D9\n",
       "10  A10  B10  C10  D10\n",
       "11  A11  B11  C11  D11"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frames = [df1, df2, df3]\n",
    "result = pd.concat(frames)\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">x</th>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">y</th>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">z</th>\n",
       "      <th>8</th>\n",
       "      <td>A8</td>\n",
       "      <td>B8</td>\n",
       "      <td>C8</td>\n",
       "      <td>D8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>A9</td>\n",
       "      <td>B9</td>\n",
       "      <td>C9</td>\n",
       "      <td>D9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>A10</td>\n",
       "      <td>B10</td>\n",
       "      <td>C10</td>\n",
       "      <td>D10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>A11</td>\n",
       "      <td>B11</td>\n",
       "      <td>C11</td>\n",
       "      <td>D11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        A    B    C    D\n",
       "x 0    A0   B0   C0   D0\n",
       "  1    A1   B1   C1   D1\n",
       "  2    A2   B2   C2   D2\n",
       "  3    A3   B3   C3   D3\n",
       "y 4    A4   B4   C4   D4\n",
       "  5    A5   B5   C5   D5\n",
       "  6    A6   B6   C6   D6\n",
       "  7    A7   B7   C7   D7\n",
       "z 8    A8   B8   C8   D8\n",
       "  9    A9   B9   C9   D9\n",
       "  10  A10  B10  C10  D10\n",
       "  11  A11  B11  C11  D11"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.concat(frames, keys=('x', 'y', 'z'))\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "4  A4  B4  C4  D4\n",
       "5  A5  B5  C5  D5\n",
       "6  A6  B6  C6  D6\n",
       "7  A7  B7  C7  D7"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.loc['y']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>B</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>B2</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B3</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>B6</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>B7</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    B   D   F\n",
       "2  B2  D2  F2\n",
       "3  B3  D3  F3\n",
       "6  B6  D6  F6\n",
       "7  B7  D7  F7"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],\n",
    "                    'D': ['D2', 'D3', 'D6', 'D7'],\n",
    "                    'F': ['F2', 'F3', 'F6', 'F7']},\n",
    "                    index=[2, 3, 6, 7])\n",
    "df4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "0  A0  B0  C0  D0\n",
       "1  A1  B1  C1  D1\n",
       "2  A2  B2  C2  D2\n",
       "3  A3  B3  C3  D3"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/luke.thompson/miniconda3/envs/python3/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n",
      "of pandas will change to not sort by default.\n",
      "\n",
      "To accept the future behavior, pass 'sort=False'.\n",
      "\n",
      "To retain the current behavior and silence the warning, pass 'sort=True'.\n",
      "\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B    C   D    F\n",
       "0   A0  B0   C0  D0  NaN\n",
       "1   A1  B1   C1  D1  NaN\n",
       "2   A2  B2   C2  D2  NaN\n",
       "3   A3  B3   C3  D3  NaN\n",
       "2  NaN  B2  NaN  D2   F2\n",
       "3  NaN  B3  NaN  D3   F3\n",
       "6  NaN  B6  NaN  D6   F6\n",
       "7  NaN  B7  NaN  D7   F7"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df4]) # axis=0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>B</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>B2</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>B3</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B6</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B7</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B    C    D    B    D    F\n",
       "0   A0   B0   C0   D0  NaN  NaN  NaN\n",
       "1   A1   B1   C1   D1  NaN  NaN  NaN\n",
       "2   A2   B2   C2   D2   B2   D2   F2\n",
       "3   A3   B3   C3   D3   B3   D3   F3\n",
       "6  NaN  NaN  NaN  NaN   B6   D6   F6\n",
       "7  NaN  NaN  NaN  NaN   B7   D7   F7"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df4], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>B</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>B2</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>B3</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B6</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B7</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B    C    D    B    D    F\n",
       "0   A0   B0   C0   D0  NaN  NaN  NaN\n",
       "1   A1   B1   C1   D1  NaN  NaN  NaN\n",
       "2   A2   B2   C2   D2   B2   D2   F2\n",
       "3   A3   B3   C3   D3   B3   D3   F3\n",
       "6  NaN  NaN  NaN  NaN   B6   D6   F6\n",
       "7  NaN  NaN  NaN  NaN   B7   D7   F7"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df4], axis=1, join='outer')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>B</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>B2</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>B3</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D   B   D   F\n",
       "2  A2  B2  C2  D2  B2  D2  F2\n",
       "3  A3  B3  C3  D3  B3  D3  F3"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df4], axis=1, join='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/luke.thompson/miniconda3/envs/python3/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n",
      "of pandas will change to not sort by default.\n",
      "\n",
      "To accept the future behavior, pass 'sort=False'.\n",
      "\n",
      "To retain the current behavior and silence the warning, pass 'sort=True'.\n",
      "\n",
      "  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B    C   D    F\n",
       "0   A0  B0   C0  D0  NaN\n",
       "1   A1  B1   C1  D1  NaN\n",
       "2   A2  B2   C2  D2  NaN\n",
       "3   A3  B3   C3  D3  NaN\n",
       "4  NaN  B2  NaN  D2   F2\n",
       "5  NaN  B3  NaN  D3   F3\n",
       "6  NaN  B6  NaN  D6   F6\n",
       "7  NaN  B7  NaN  D7   F7"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ignoring indexes\n",
    "pd.concat([df1, df4], ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>foo</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   foo  0  1\n",
       "0    0  0  0\n",
       "1    1  1  1\n",
       "2    2  2  4\n",
       "3    3  3  5"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s3 = pd.Series([0, 1, 2, 3], name='foo')\n",
    "s4 = pd.Series([0, 1, 2, 3])\n",
    "s5 = pd.Series([0, 1, 4, 5])\n",
    "result = pd.concat([s3, s4, s5], axis=1)\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>red</th>\n",
       "      <th>blue</th>\n",
       "      <th>yellow</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   red  blue  yellow\n",
       "0    0     0       0\n",
       "1    1     1       1\n",
       "2    2     2       4\n",
       "3    3     3       5"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.concat([s3, s4, s5], axis=1, keys=('red', 'blue', 'yellow'))\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"append\"></a>\n",
    "\n",
    "#### append"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "0  A0  B0  C0  D0\n",
       "1  A1  B1  C1  D1\n",
       "2  A2  B2  C2  D2\n",
       "3  A3  B3  C3  D3\n",
       "4  A4  B4  C4  D4\n",
       "5  A5  B5  C5  D5\n",
       "6  A6  B6  C6  D6\n",
       "7  A7  B7  C7  D7"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.append(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A4</td>\n",
       "      <td>B4</td>\n",
       "      <td>C4</td>\n",
       "      <td>D4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A5</td>\n",
       "      <td>B5</td>\n",
       "      <td>C5</td>\n",
       "      <td>D5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A6</td>\n",
       "      <td>B6</td>\n",
       "      <td>C6</td>\n",
       "      <td>D6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A7</td>\n",
       "      <td>B7</td>\n",
       "      <td>C7</td>\n",
       "      <td>D7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "0  A0  B0  C0  D0\n",
       "1  A1  B1  C1  D1\n",
       "2  A2  B2  C2  D2\n",
       "3  A3  B3  C3  D3\n",
       "4  A4  B4  C4  D4\n",
       "5  A5  B5  C5  D5\n",
       "6  A6  B6  C6  D6\n",
       "7  A7  B7  C7  D7"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.append(df2, sort=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/luke.thompson/miniconda3/envs/python3/lib/python3.7/site-packages/pandas/core/frame.py:6211: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n",
      "of pandas will change to not sort by default.\n",
      "\n",
      "To accept the future behavior, pass 'sort=False'.\n",
      "\n",
      "To retain the current behavior and silence the warning, pass 'sort=True'.\n",
      "\n",
      "  sort=sort)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D2</td>\n",
       "      <td>F2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D3</td>\n",
       "      <td>F3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D6</td>\n",
       "      <td>F6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>B7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D7</td>\n",
       "      <td>F7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B    C   D    F\n",
       "0   A0  B0   C0  D0  NaN\n",
       "1   A1  B1   C1  D1  NaN\n",
       "2   A2  B2   C2  D2  NaN\n",
       "3   A3  B3   C3  D3  NaN\n",
       "2  NaN  B2  NaN  D2   F2\n",
       "3  NaN  B3  NaN  D3   F3\n",
       "6  NaN  B6  NaN  D6   F6\n",
       "7  NaN  B7  NaN  D7   F7"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.append(df4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"merge\"></a>\n",
    "\n",
    "#### merge\n",
    "\n",
    "    pd.merge(left, right, how='inner', on=None, left_on=None, \n",
    "        right_on=None, left_index=False, right_index=False,\n",
    "        sort=True, suffixes=('_x', '_y'), copy=True, indicator=False)\n",
    "\n",
    "* left: A DataFrame object\n",
    "* right: Another DataFrame object\n",
    "* on: Columns (names) to join on. Must be found in both the left and right DataFrame objects. If not passed and left_index and right_index are False, the intersection of the columns in the DataFrames will be inferred to be the join keys\n",
    "* left_on: Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame\n",
    "* right_on: Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame\n",
    "* left_index: If True, use the index (row labels) from the left DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame\n",
    "* right_index: Same usage as left_index for the right DataFrame\n",
    "* how: One of 'left', 'right', 'outer', 'inner'. Defaults to inner. See below for more detailed description of each method\n",
    "* sort: Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve performance substantially in many cases\n",
    "* suffixes: A tuple of string suffixes to apply to overlapping columns. Defaults to ('_x', '_y').\n",
    "* copy: Always copy data (default True) from the passed DataFrame objects, even when reindexing is not necessary. Cannot be avoided in many cases but may improve performance / memory usage. The cases where copying can be avoided are somewhat pathological but this option is provided nonetheless.\n",
    "* indicator: Add a column to the output DataFrame called _merge with information on the source of each row. _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation’s merge key is found in both."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n",
    "                     'A': ['A0', 'A1', 'A2', 'A3'],\n",
    "                     'B': ['B0', 'B1', 'B2', 'B3']})\n",
    "right = pd.DataFrame({'key': ['K3', 'K2', 'K1', 'K0'],\n",
    "                      'C': ['C0', 'C1', 'C2', 'C3'],\n",
    "                      'D': ['D0', 'D1', 'D2', 'D3']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K2</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K3</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key   A   B\n",
       "0  K0  A0  B0\n",
       "1  K1  A1  B1\n",
       "2  K2  A2  B2\n",
       "3  K3  A3  B3"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K3</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K0</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key   C   D\n",
       "0  K3  C0  D0\n",
       "1  K2  C1  D1\n",
       "2  K1  C2  D2\n",
       "3  K0  C3  D3"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K2</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K3</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key   A   B   C   D\n",
       "0  K0  A0  B0  C3  D3\n",
       "1  K1  A1  B1  C2  D2\n",
       "2  K2  A2  B2  C1  D1\n",
       "3  K3  A3  B3  C0  D0"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, on='key')\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n",
    "                     'key2': ['K0', 'K1', 'K0', 'K1'],\n",
    "                     'A': ['A0', 'A1', 'A2', 'A3'],\n",
    "                     'B': ['B0', 'B1', 'B2', 'B3']})\n",
    "right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n",
    "                      'key2': ['K0', 'K0', 'K0', 'K0'],\n",
    "                      'C': ['C0', 'C1', 'C2', 'C3'],\n",
    "                      'D': ['D0', 'D1', 'D2', 'D3']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2   A   B\n",
       "0   K0   K0  A0  B0\n",
       "1   K0   K1  A1  B1\n",
       "2   K1   K0  A2  B2\n",
       "3   K2   K1  A3  B3"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2   C   D\n",
       "0   K0   K0  C0  D0\n",
       "1   K1   K0  C1  D1\n",
       "2   K1   K0  C2  D2\n",
       "3   K2   K0  C3  D3"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2   A   B   C   D\n",
       "0   K0   K0  A0  B0  C0  D0\n",
       "1   K1   K0  A2  B2  C1  D1\n",
       "2   K1   K0  A2  B2  C2  D2"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(left, right, on=['key1', 'key2'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `how` argument to merge specifies how to determine which keys are to be included in the resulting table. If a key combination does not appear in either the left or right tables, the values in the joined table will be NA. Here is a summary of the how options and their SQL equivalent names:\n",
    "\n",
    "Merge method    | SQL Join Name | Description\n",
    "--------------- | ---------------- | -----------\n",
    "left            | LEFT OUTER JOIN  | Use keys from left frame only\n",
    "right\t        | RIGHT OUTER JOIN\t| Use keys from right frame only\n",
    "outer\t        | FULL OUTER JOIN\t| Use union of keys from both frames\n",
    "inner (default) | INNER JOIN\t    | Use intersection of keys from both frames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2   A   B    C    D\n",
       "0   K0   K0  A0  B0   C0   D0\n",
       "1   K0   K1  A1  B1  NaN  NaN\n",
       "2   K1   K0  A2  B2   C1   D1\n",
       "3   K1   K0  A2  B2   C2   D2\n",
       "4   K2   K1  A3  B3  NaN  NaN"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(left, right, how='left', on=['key1', 'key2'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2    A    B   C   D\n",
       "0   K0   K0   A0   B0  C0  D0\n",
       "1   K1   K0   A2   B2  C1  D1\n",
       "2   K1   K0   A2   B2  C2  D2\n",
       "3   K2   K0  NaN  NaN  C3  D3"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, how='right', on=['key1', 'key2'])\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2    A    B    C    D\n",
       "0   K0   K0   A0   B0   C0   D0\n",
       "1   K0   K1   A1   B1  NaN  NaN\n",
       "2   K1   K0   A2   B2   C1   D1\n",
       "3   K1   K0   A2   B2   C2   D2\n",
       "4   K2   K1   A3   B3  NaN  NaN\n",
       "5   K2   K0  NaN  NaN   C3   D3"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, how='outer', on=['key1', 'key2'])\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2   A   B   C   D\n",
       "0   K0   K0  A0  B0  C0  D0\n",
       "1   K1   K0  A2  B2  C1  D1\n",
       "2   K1   K0  A2  B2  C2  D2"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, how='inner', on=['key1', 'key2'])\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# merge indicator\n",
    "df1 = pd.DataFrame({'col1':[0,1], 'col_left':['a','b']})\n",
    "df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col_left</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1 col_left\n",
       "0     0        a\n",
       "1     1        b"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col_right</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col_right\n",
       "0     1          2\n",
       "1     2          2\n",
       "2     2          2"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col_left</th>\n",
       "      <th>col_right</th>\n",
       "      <th>Merged</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "      <td>2.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1 col_left  col_right      Merged\n",
       "0     0        a        NaN   left_only\n",
       "1     1        b        2.0        both\n",
       "2     2      NaN        2.0  right_only\n",
       "3     2      NaN        2.0  right_only"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# note: the argument indicator=True is an option with pandas 0.17.0 and greater\n",
    "result = pd.merge(df1, df2, on='col1', how='outer', indicator='Merged')\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"join\"></a>\n",
    "\n",
    "#### df.join()\n",
    "\n",
    "The related DataFrame.join method, uses merge internally for the index-on-index and index-on-column(s) joins, **but joins on indexes by default** rather than trying to join on common columns (the default behavior for merge). If you are joining on index, you may wish to use DataFrame.join to save yourself some typing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n",
    "                     'B': ['B0', 'B1', 'B2']},\n",
    "                     index=['K0', 'K1', 'K2'])\n",
    "right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n",
    "                      'D': ['D0', 'D2', 'D3']},\n",
    "                      index=['K0', 'K2', 'K3'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B\n",
       "K0  A0  B0\n",
       "K1  A1  B1\n",
       "K2  A2  B2"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K3</th>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     C   D\n",
       "K0  C0  D0\n",
       "K2  C2  D2\n",
       "K3  C3  D3"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B    C    D\n",
       "K0  A0  B0   C0   D0\n",
       "K1  A1  B1  NaN  NaN\n",
       "K2  A2  B2   C2   D2"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left.join(right, how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B   C   D\n",
       "K0   A0   B0  C0  D0\n",
       "K2   A2   B2  C2  D2\n",
       "K3  NaN  NaN  C3  D3"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left.join(right, how='right')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B   C   D\n",
       "K0  A0  B0  C0  D0\n",
       "K2  A2  B2  C2  D2"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left.join(right, how='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>K0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B    C    D\n",
       "K0   A0   B0   C0   D0\n",
       "K1   A1   B1  NaN  NaN\n",
       "K2   A2   B2   C2   D2\n",
       "K3  NaN  NaN   C3   D3"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left.join(right, how='outer')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "# overlapping value columns: suffixes\n",
    "left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})\n",
    "right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k</th>\n",
       "      <th>v</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k  v\n",
       "0  K0  1\n",
       "1  K1  2\n",
       "2  K2  3"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k</th>\n",
       "      <th>v</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K3</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k  v\n",
       "0  K0  4\n",
       "1  K0  5\n",
       "2  K3  6"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k</th>\n",
       "      <th>v_x</th>\n",
       "      <th>v_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k  v_x  v_y\n",
       "0  K0    1    4\n",
       "1  K0    1    5"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, on='k')\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k</th>\n",
       "      <th>v_l</th>\n",
       "      <th>v_r</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k  v_l  v_r\n",
       "0  K0    1    4\n",
       "1  K0    1    5"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(left, right, on='k', suffixes=['_l', '_r'])\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Why would I use this? Example: Merging gene expression tables\n",
    "degs = pd.DataFrame({'gene': ['cds1', 'cds3', 'cds6'], 'count': [345, 887, 459]})\n",
    "names = pd.DataFrame({'gene': ['cds1', 'cds2', 'cds3', 'cds4', 'cds5', 'cds6'], 'description': ['primase', 'ligase', 'aldolase', 'amylase', 'polymerase', 'kinase']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cds1</td>\n",
       "      <td>345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cds3</td>\n",
       "      <td>887</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cds6</td>\n",
       "      <td>459</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gene  count\n",
       "0  cds1    345\n",
       "1  cds3    887\n",
       "2  cds6    459"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "degs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cds1</td>\n",
       "      <td>primase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cds2</td>\n",
       "      <td>ligase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cds3</td>\n",
       "      <td>aldolase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>cds4</td>\n",
       "      <td>amylase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cds5</td>\n",
       "      <td>polymerase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>cds6</td>\n",
       "      <td>kinase</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gene description\n",
       "0  cds1     primase\n",
       "1  cds2      ligase\n",
       "2  cds3    aldolase\n",
       "3  cds4     amylase\n",
       "4  cds5  polymerase\n",
       "5  cds6      kinase"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "degs_plus_names = pd.merge(degs, names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene</th>\n",
       "      <th>count</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cds1</td>\n",
       "      <td>345</td>\n",
       "      <td>primase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cds3</td>\n",
       "      <td>887</td>\n",
       "      <td>aldolase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cds6</td>\n",
       "      <td>459</td>\n",
       "      <td>kinase</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gene  count description\n",
       "0  cds1    345     primase\n",
       "1  cds3    887    aldolase\n",
       "2  cds6    459      kinase"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "degs_plus_names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "# argument settings on and how were inferred/defaults\n",
    "degs_plus_names = pd.merge(degs, names, on='gene', how='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene</th>\n",
       "      <th>count</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cds1</td>\n",
       "      <td>345</td>\n",
       "      <td>primase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cds3</td>\n",
       "      <td>887</td>\n",
       "      <td>aldolase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cds6</td>\n",
       "      <td>459</td>\n",
       "      <td>kinase</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gene  count description\n",
       "0  cds1    345     primase\n",
       "1  cds3    887    aldolase\n",
       "2  cds6    459      kinase"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "degs_plus_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### merge/join by index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cds1</th>\n",
       "      <td>345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds3</th>\n",
       "      <td>887</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds6</th>\n",
       "      <td>459</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      count\n",
       "gene       \n",
       "cds1    345\n",
       "cds3    887\n",
       "cds6    459"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "degs.index = degs.gene\n",
    "degs.drop('gene', axis=1, inplace=True)\n",
    "degs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cds1</th>\n",
       "      <td>primase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds2</th>\n",
       "      <td>ligase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds3</th>\n",
       "      <td>aldolase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds4</th>\n",
       "      <td>amylase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds5</th>\n",
       "      <td>polymerase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds6</th>\n",
       "      <td>kinase</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     description\n",
       "gene            \n",
       "cds1     primase\n",
       "cds2      ligase\n",
       "cds3    aldolase\n",
       "cds4     amylase\n",
       "cds5  polymerase\n",
       "cds6      kinase"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names.index = names.gene\n",
    "names.drop('gene', axis=1, inplace=True)\n",
    "names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cds1</th>\n",
       "      <td>345</td>\n",
       "      <td>primase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds3</th>\n",
       "      <td>887</td>\n",
       "      <td>aldolase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cds6</th>\n",
       "      <td>459</td>\n",
       "      <td>kinase</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      count description\n",
       "gene                   \n",
       "cds1    345     primase\n",
       "cds3    887    aldolase\n",
       "cds6    459      kinase"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(left=degs, right=names, left_index=True, right_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"set_option\"></a>\n",
    "\n",
    "### World Series example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### pd.set_option()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the maximum number of DataFrame rows displayed\n",
    "pd.set_option(\"display.max_rows\", 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ws = pd.read_csv('../data/world_series_winners.txt', header=None)\n",
    "df_ws.columns = ['team']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston Americans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>No World Series</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chicago White Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111</th>\n",
       "      <td>San Francisco Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>Kansas City Royals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>Houston Astros</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>Boston Red Sox</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>116 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     team\n",
       "0        Boston Americans\n",
       "1         No World Series\n",
       "2         New York Giants\n",
       "3       Chicago White Sox\n",
       "4            Chicago Cubs\n",
       "..                    ...\n",
       "111  San Francisco Giants\n",
       "112    Kansas City Royals\n",
       "113          Chicago Cubs\n",
       "114        Houston Astros\n",
       "115        Boston Red Sox\n",
       "\n",
       "[116 rows x 1 columns]"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ws"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_new = pd.DataFrame({'team': ['San Diego Padres']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>San Diego Padres</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               team\n",
       "0  San Diego Padres"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### pd.concat()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston Americans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>No World Series</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chicago White Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>Kansas City Royals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>Houston Astros</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>Boston Red Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>San Diego Padres</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>117 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                   team\n",
       "0      Boston Americans\n",
       "1       No World Series\n",
       "2       New York Giants\n",
       "3     Chicago White Sox\n",
       "4          Chicago Cubs\n",
       "..                  ...\n",
       "112  Kansas City Royals\n",
       "113        Chicago Cubs\n",
       "114      Houston Astros\n",
       "115      Boston Red Sox\n",
       "116    San Diego Padres\n",
       "\n",
       "[117 rows x 1 columns]"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.concat([df_ws, df_new], ignore_index=True)\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### add columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "117"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2['year'] = np.arange(1903, 1903 + df2.shape[0])\n",
    "df2['first_initial'] = [team[0] for team in df2['team']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "      <th>year</th>\n",
       "      <th>first_initial</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston Americans</td>\n",
       "      <td>1903</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>No World Series</td>\n",
       "      <td>1904</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York Giants</td>\n",
       "      <td>1905</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chicago White Sox</td>\n",
       "      <td>1906</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "      <td>1907</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>Kansas City Royals</td>\n",
       "      <td>2015</td>\n",
       "      <td>K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "      <td>2016</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>Houston Astros</td>\n",
       "      <td>2017</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>Boston Red Sox</td>\n",
       "      <td>2018</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>San Diego Padres</td>\n",
       "      <td>2019</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>117 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                   team  year first_initial\n",
       "0      Boston Americans  1903             B\n",
       "1       No World Series  1904             N\n",
       "2       New York Giants  1905             N\n",
       "3     Chicago White Sox  1906             C\n",
       "4          Chicago Cubs  1907             C\n",
       "..                  ...   ...           ...\n",
       "112  Kansas City Royals  2015             K\n",
       "113        Chicago Cubs  2016             C\n",
       "114      Houston Astros  2017             H\n",
       "115      Boston Red Sox  2018             B\n",
       "116    San Diego Padres  2019             S\n",
       "\n",
       "[117 rows x 3 columns]"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"stack\"></a>\n",
    "\n",
    "#### df.stack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    team             Boston Americans\n",
       "     year                         1903\n",
       "     first_initial                   B\n",
       "1    team              No World Series\n",
       "     year                         1904\n",
       "                            ...       \n",
       "115  year                         2018\n",
       "     first_initial                   B\n",
       "116  team             San Diego Padres\n",
       "     year                         2019\n",
       "     first_initial                   S\n",
       "Length: 351, dtype: object"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stack_df = df2.stack()\n",
    "stack_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(stack_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "team             Chicago Cubs\n",
       "year                     2016\n",
       "first_initial               C\n",
       "dtype: object"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stack_df[113]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Chicago Cubs'"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stack_df[113]['team']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">0</th>\n",
       "      <th>team</th>\n",
       "      <td>Boston Americans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <td>1903</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>first_initial</th>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>team</th>\n",
       "      <td>No World Series</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <td>1904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">115</th>\n",
       "      <th>year</th>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>first_initial</th>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">116</th>\n",
       "      <th>team</th>\n",
       "      <td>San Diego Padres</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <td>2019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>first_initial</th>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>351 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                  0\n",
       "0   team           Boston Americans\n",
       "    year                       1903\n",
       "    first_initial                 B\n",
       "1   team            No World Series\n",
       "    year                       1904\n",
       "...                             ...\n",
       "115 year                       2018\n",
       "    first_initial                 B\n",
       "116 team           San Diego Padres\n",
       "    year                       2019\n",
       "    first_initial                 S\n",
       "\n",
       "[351 rows x 1 columns]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(stack_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"unstack\"></a>\n",
    "\n",
    "#### df.unstack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "team           0       Boston Americans\n",
       "               1        No World Series\n",
       "               2        New York Giants\n",
       "               3      Chicago White Sox\n",
       "               4           Chicago Cubs\n",
       "                            ...        \n",
       "first_initial  112                    K\n",
       "               113                    C\n",
       "               114                    H\n",
       "               115                    B\n",
       "               116                    S\n",
       "Length: 351, dtype: object"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unstack_df = df2.unstack()\n",
    "unstack_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(unstack_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Chicago Cubs'"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unstack_df['team'][113]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">team</th>\n",
       "      <th>0</th>\n",
       "      <td>Boston Americans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>No World Series</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chicago White Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Chicago Cubs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">first_initial</th>\n",
       "      <th>112</th>\n",
       "      <td>K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>351 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   0\n",
       "team          0     Boston Americans\n",
       "              1      No World Series\n",
       "              2      New York Giants\n",
       "              3    Chicago White Sox\n",
       "              4         Chicago Cubs\n",
       "...                              ...\n",
       "first_initial 112                  K\n",
       "              113                  C\n",
       "              114                  H\n",
       "              115                  B\n",
       "              116                  S\n",
       "\n",
       "[351 rows x 1 columns]"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(unstack_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Survey data example with stack and unstack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('../data/survey_scores_2018_fall.csv', index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>861167</th>\n",
       "      <th>346538</th>\n",
       "      <th>473138</th>\n",
       "      <th>268370</th>\n",
       "      <th>21801</th>\n",
       "      <th>996203</th>\n",
       "      <th>52398</th>\n",
       "      <th>980220</th>\n",
       "      <th>47263</th>\n",
       "      <th>232211</th>\n",
       "      <th>...</th>\n",
       "      <th>165235</th>\n",
       "      <th>366626</th>\n",
       "      <th>944438</th>\n",
       "      <th>818177</th>\n",
       "      <th>287628</th>\n",
       "      <th>189753</th>\n",
       "      <th>783121</th>\n",
       "      <th>474873</th>\n",
       "      <th>481858</th>\n",
       "      <th>318480</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>subject_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>computer_has</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>...</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>computer_os</th>\n",
       "      <td>Mac</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac;Windows</td>\n",
       "      <td>Windows</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac</td>\n",
       "      <td>...</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Windows</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Windows</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Mac</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_command</th>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_r</th>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_matlab</th>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>None</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_perl</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_python</th>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              861167        346538       473138 268370     21801 996203  \\\n",
       "subject_id                                                                \n",
       "computer_has     Yes           Yes          Yes    Yes       Yes    Yes   \n",
       "computer_os      Mac  Windows;Unix          Mac    Mac       Mac    Mac   \n",
       "score_command   Some          Some     Moderate   Some  Moderate   Some   \n",
       "score_r         Some          Some         None   Some      Some   None   \n",
       "score_matlab    Some      Moderate  Experienced   None      None   Some   \n",
       "score_perl      None          None         None   None      None   None   \n",
       "score_python    Some      Moderate     Moderate   Some      Some   Some   \n",
       "\n",
       "                     52398    980220     47263    232211  ...   165235  \\\n",
       "subject_id                                                ...            \n",
       "computer_has           Yes       Yes       Yes       Yes  ...      Yes   \n",
       "computer_os    Mac;Windows   Windows       Mac       Mac  ...      Mac   \n",
       "score_command         Some      Some  Moderate  Moderate  ...     None   \n",
       "score_r               None  Moderate      Some      None  ...     None   \n",
       "score_matlab          Some      None      None      None  ...     None   \n",
       "score_perl            None      Some      None      None  ...     None   \n",
       "score_python      Moderate      None      Some  Moderate  ...     None   \n",
       "\n",
       "                 366626    944438    818177 287628            189753 783121  \\\n",
       "subject_id                                                                    \n",
       "computer_has        Yes       Yes       Yes    Yes               Yes    Yes   \n",
       "computer_os     Windows       Mac   Windows    Mac  Mac;Windows;Unix    Mac   \n",
       "score_command      None      None      Some   Some       Experienced   None   \n",
       "score_r            None      None  Moderate   Some              None   Some   \n",
       "score_matlab   Moderate  Moderate  Moderate   None       Experienced   None   \n",
       "score_perl         None      None      Some   None              None   None   \n",
       "score_python   Moderate      Some      None   None              Some   None   \n",
       "\n",
       "                 474873    481858 318480  \n",
       "subject_id                                \n",
       "computer_has        Yes       Yes    Yes  \n",
       "computer_os    Mac;Unix       Mac    Mac  \n",
       "score_command  Moderate      Some   Some  \n",
       "score_r        Moderate  Moderate   Some  \n",
       "score_matlab       Some      None   None  \n",
       "score_perl         None      None   None  \n",
       "score_python       Some      Some   None  \n",
       "\n",
       "[7 rows x 31 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.transpose()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>None</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>Moderate</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>Experienced</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Some</td>\n",
       "      <td>Moderate</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Some</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>Some</td>\n",
       "      <td>Some</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id computer_has       computer_os score_command   score_r  \\\n",
       "861167              Yes               Mac          Some      Some   \n",
       "346538              Yes      Windows;Unix          Some      Some   \n",
       "473138              Yes               Mac      Moderate      None   \n",
       "268370              Yes               Mac          Some      Some   \n",
       "21801               Yes               Mac      Moderate      Some   \n",
       "...                 ...               ...           ...       ...   \n",
       "189753              Yes  Mac;Windows;Unix   Experienced      None   \n",
       "783121              Yes               Mac          None      Some   \n",
       "474873              Yes          Mac;Unix      Moderate  Moderate   \n",
       "481858              Yes               Mac          Some  Moderate   \n",
       "318480              Yes               Mac          Some      Some   \n",
       "\n",
       "subject_id score_matlab score_perl score_python  \n",
       "861167             Some       None         Some  \n",
       "346538         Moderate       None     Moderate  \n",
       "473138      Experienced       None     Moderate  \n",
       "268370             None       None         Some  \n",
       "21801              None       None         Some  \n",
       "...                 ...        ...          ...  \n",
       "189753      Experienced       None         Some  \n",
       "783121             None       None         None  \n",
       "474873             Some       None         Some  \n",
       "481858             None       None         Some  \n",
       "318480             None       None         None  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id\n",
       "computer_has     object\n",
       "computer_os      object\n",
       "score_command    object\n",
       "score_r          object\n",
       "score_matlab     object\n",
       "score_perl       object\n",
       "score_python     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "def score_to_numeric(score):\n",
    "    converter = {'None': 0, 'Some': 1, 'Moderate': 2, 'Experienced': 3}\n",
    "    return converter[score]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "numeric_cols = ['score_command', 'score_r', 'score_matlab', 'score_perl', 'score_python']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [],
   "source": [
    "for col in numeric_cols:\n",
    "    df[col] = df[col].apply(score_to_numeric)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>Yes</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id computer_has       computer_os  score_command  score_r  \\\n",
       "861167              Yes               Mac              1        1   \n",
       "346538              Yes      Windows;Unix              1        1   \n",
       "473138              Yes               Mac              2        0   \n",
       "268370              Yes               Mac              1        1   \n",
       "21801               Yes               Mac              2        1   \n",
       "...                 ...               ...            ...      ...   \n",
       "189753              Yes  Mac;Windows;Unix              3        0   \n",
       "783121              Yes               Mac              0        1   \n",
       "474873              Yes          Mac;Unix              2        2   \n",
       "481858              Yes               Mac              1        2   \n",
       "318480              Yes               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "861167                 1           0             1  \n",
       "346538                 2           0             2  \n",
       "473138                 3           0             2  \n",
       "268370                 0           0             1  \n",
       "21801                  0           0             1  \n",
       "...                  ...         ...           ...  \n",
       "189753                 3           0             1  \n",
       "783121                 0           0             0  \n",
       "474873                 1           0             1  \n",
       "481858                 0           0             1  \n",
       "318480                 0           0             0  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id\n",
       "computer_has     object\n",
       "computer_os      object\n",
       "score_command     int64\n",
       "score_r           int64\n",
       "score_matlab      int64\n",
       "score_perl        int64\n",
       "score_python      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  score_command  score_r  score_matlab  score_perl  score_python\n",
       "861167                  1        1             1           0             1\n",
       "346538                  1        1             2           0             2\n",
       "473138                  2        0             3           0             2\n",
       "268370                  1        1             0           0             1\n",
       "21801                   2        1             0           0             1\n",
       "...                   ...      ...           ...         ...           ...\n",
       "189753                  3        0             3           0             1\n",
       "783121                  0        1             0           0             0\n",
       "474873                  2        2             1           0             1\n",
       "481858                  1        2             0           0             1\n",
       "318480                  1        1             0           0             0\n",
       "\n",
       "[31 rows x 5 columns]"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# just select the numeric columns (exclude first 2 columns)\n",
    "df.iloc[:,2:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### df.stack() and df.unstack() (again)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "        subject_id   \n",
       "861167  score_command    1\n",
       "        score_r          1\n",
       "        score_matlab     1\n",
       "        score_perl       0\n",
       "        score_python     1\n",
       "                        ..\n",
       "318480  score_command    1\n",
       "        score_r          1\n",
       "        score_matlab     0\n",
       "        score_perl       0\n",
       "        score_python     0\n",
       "Length: 155, dtype: int64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,2:].stack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id           \n",
       "score_command  861167    1\n",
       "               346538    1\n",
       "               473138    2\n",
       "               268370    1\n",
       "               21801     2\n",
       "                        ..\n",
       "score_python   189753    1\n",
       "               783121    0\n",
       "               474873    1\n",
       "               481858    1\n",
       "               318480    0\n",
       "Length: 155, dtype: int64"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,2:].unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"values\"></a>\n",
    "\n",
    "#### values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "861167    1\n",
       "346538    1\n",
       "473138    2\n",
       "268370    1\n",
       "21801     2\n",
       "         ..\n",
       "189753    3\n",
       "783121    0\n",
       "474873    2\n",
       "481858    1\n",
       "318480    1\n",
       "Name: score_command, Length: 31, dtype: int64"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sometimes you need the values of a DataFrame, not the DataFrame representation of it\n",
    "\n",
    "df.score_command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 0, 1, 2, 0, 1, 1, 1, 1, 1, 1, 0,\n",
       "       0, 0, 1, 1, 3, 0, 2, 1, 1])"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.score_command.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### value_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    18\n",
       "2     6\n",
       "0     6\n",
       "3     1\n",
       "Name: score_command, dtype: int64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.score_command.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"apply\"></a>\n",
    "<a id=\"lambda\"></a>\n",
    "\n",
    "#### apply, lambda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = lambda x: x.max() - x.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = lambda x: x.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id\n",
       "score_command    3\n",
       "score_r          3\n",
       "score_matlab     3\n",
       "score_perl       1\n",
       "score_python     2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[numeric_cols].apply(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    18\n",
       "2     6\n",
       "0     6\n",
       "3     1\n",
       "Name: score_command, dtype: int64"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['score_command'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>12</td>\n",
       "      <td>15</td>\n",
       "      <td>28.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>18</td>\n",
       "      <td>12</td>\n",
       "      <td>7</td>\n",
       "      <td>3.0</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  score_command  score_r  score_matlab  score_perl  score_python\n",
       "0                       6       12            15        28.0          11.0\n",
       "1                      18       12             7         3.0          13.0\n",
       "2                       6        6             4         NaN           7.0\n",
       "3                       1        1             5         NaN           NaN"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[numeric_cols].apply(g)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"map\"></a>\n",
    "\n",
    "#### map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id\n",
       "computer_has     object\n",
       "computer_os      object\n",
       "score_command     int64\n",
       "score_r           int64\n",
       "score_matlab      int64\n",
       "score_perl        int64\n",
       "score_python      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can change the Y/N strings to boolean True/False using a dict and map\n",
    "yn_tf = {'Yes': True, 'No': False}\n",
    "df['computer_has'] = df['computer_has'].map(yn_tf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "861167              True               Mac              1        1   \n",
       "346538              True      Windows;Unix              1        1   \n",
       "473138              True               Mac              2        0   \n",
       "268370              True               Mac              1        1   \n",
       "21801               True               Mac              2        1   \n",
       "...                  ...               ...            ...      ...   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "783121              True               Mac              0        1   \n",
       "474873              True          Mac;Unix              2        2   \n",
       "481858              True               Mac              1        2   \n",
       "318480              True               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "861167                 1           0             1  \n",
       "346538                 2           0             2  \n",
       "473138                 3           0             2  \n",
       "268370                 0           0             1  \n",
       "21801                  0           0             1  \n",
       "...                  ...         ...           ...  \n",
       "189753                 3           0             1  \n",
       "783121                 0           0             0  \n",
       "474873                 1           0             1  \n",
       "481858                 0           0             1  \n",
       "318480                 0           0             0  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "subject_id\n",
       "computer_has       bool\n",
       "computer_os      object\n",
       "score_command     int64\n",
       "score_r           int64\n",
       "score_matlab      int64\n",
       "score_perl        int64\n",
       "score_python      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"sort_index\"></a>\n",
    "<a id=\"sort_values\"></a>\n",
    "\n",
    "#### sort_index, sort_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>165235</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>232211</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>874782</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>944438</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>980220</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>996203</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "165235              True               Mac              0        0   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "21801               True               Mac              2        1   \n",
       "232211              True               Mac              2        0   \n",
       "268370              True               Mac              1        1   \n",
       "...                  ...               ...            ...      ...   \n",
       "861167              True               Mac              1        1   \n",
       "874782              True      Windows;Unix              2        1   \n",
       "944438              True               Mac              0        0   \n",
       "980220              True           Windows              1        2   \n",
       "996203              True               Mac              1        0   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "165235                 0           0             0  \n",
       "189753                 3           0             1  \n",
       "21801                  0           0             1  \n",
       "232211                 0           0             2  \n",
       "268370                 0           0             1  \n",
       "...                  ...         ...           ...  \n",
       "861167                 1           0             1  \n",
       "874782                 1           0             2  \n",
       "944438                 2           0             1  \n",
       "980220                 0           1             0  \n",
       "996203                 1           0             1  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index(axis=0, ascending=True, inplace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "      <th>score_r</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_matlab  \\\n",
       "861167              True               Mac              1             1   \n",
       "346538              True      Windows;Unix              1             2   \n",
       "473138              True               Mac              2             3   \n",
       "268370              True               Mac              1             0   \n",
       "21801               True               Mac              2             0   \n",
       "...                  ...               ...            ...           ...   \n",
       "189753              True  Mac;Windows;Unix              3             3   \n",
       "783121              True               Mac              0             0   \n",
       "474873              True          Mac;Unix              2             1   \n",
       "481858              True               Mac              1             0   \n",
       "318480              True               Mac              1             0   \n",
       "\n",
       "subject_id  score_perl  score_python  score_r  \n",
       "861167               0             1        1  \n",
       "346538               0             2        1  \n",
       "473138               0             2        0  \n",
       "268370               0             1        1  \n",
       "21801                0             1        1  \n",
       "...                ...           ...      ...  \n",
       "189753               0             1        0  \n",
       "783121               0             0        1  \n",
       "474873               0             1        2  \n",
       "481858               0             1        2  \n",
       "318480               0             0        1  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index(axis=1, ascending=True, inplace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>463808</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>442558</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630741</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47263</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>980220</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "473138              True               Mac              2        0   \n",
       "463808              True       Mac;Windows              1        1   \n",
       "442558              True           Windows              0        0   \n",
       "630741              True               Mac              0        0   \n",
       "...                  ...               ...            ...      ...   \n",
       "47263               True               Mac              2        1   \n",
       "980220              True           Windows              1        2   \n",
       "21801               True               Mac              2        1   \n",
       "268370              True               Mac              1        1   \n",
       "318480              True               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "189753                 3           0             1  \n",
       "473138                 3           0             2  \n",
       "463808                 3           0             1  \n",
       "442558                 3           0             0  \n",
       "630741                 3           0             1  \n",
       "...                  ...         ...           ...  \n",
       "47263                  0           0             1  \n",
       "980220                 0           1             0  \n",
       "21801                  0           0             1  \n",
       "268370                 0           0             1  \n",
       "318480                 0           0             0  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values('score_matlab', axis=0, ascending=False, inplace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>463808</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630741</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>442558</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>232211</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>733358</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165235</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "463808              True       Mac;Windows              1        1   \n",
       "473138              True               Mac              2        0   \n",
       "630741              True               Mac              0        0   \n",
       "442558              True           Windows              0        0   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "...                  ...               ...            ...      ...   \n",
       "783121              True               Mac              0        1   \n",
       "318480              True               Mac              1        1   \n",
       "232211              True               Mac              2        0   \n",
       "733358              True           Windows              1        0   \n",
       "165235              True               Mac              0        0   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "463808                 3           0             1  \n",
       "473138                 3           0             2  \n",
       "630741                 3           0             1  \n",
       "442558                 3           0             0  \n",
       "189753                 3           0             1  \n",
       "...                  ...         ...           ...  \n",
       "783121                 0           0             0  \n",
       "318480                 0           0             0  \n",
       "232211                 0           0             2  \n",
       "733358                 0           0             1  \n",
       "165235                 0           0             0  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(['score_matlab', 'score_r'], axis=0, ascending=False, inplace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "861167              True               Mac              1        1   \n",
       "346538              True      Windows;Unix              1        1   \n",
       "473138              True               Mac              2        0   \n",
       "268370              True               Mac              1        1   \n",
       "21801               True               Mac              2        1   \n",
       "...                  ...               ...            ...      ...   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "783121              True               Mac              0        1   \n",
       "474873              True          Mac;Unix              2        2   \n",
       "481858              True               Mac              1        2   \n",
       "318480              True               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  \n",
       "861167                 1           0             1  \n",
       "346538                 2           0             2  \n",
       "473138                 3           0             2  \n",
       "268370                 0           0             1  \n",
       "21801                  0           0             1  \n",
       "...                  ...         ...           ...  \n",
       "189753                 3           0             1  \n",
       "783121                 0           0             0  \n",
       "474873                 1           0             1  \n",
       "481858                 0           0             1  \n",
       "318480                 0           0             0  \n",
       "\n",
       "[31 rows x 7 columns]"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"isnull\"></a>\n",
    "<a id=\"fillna\"></a>\n",
    "\n",
    "#### df.isnull(), df.fillna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['score_bash'] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "      <th>score_bash</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "861167              True               Mac              1        1   \n",
       "346538              True      Windows;Unix              1        1   \n",
       "473138              True               Mac              2        0   \n",
       "268370              True               Mac              1        1   \n",
       "21801               True               Mac              2        1   \n",
       "...                  ...               ...            ...      ...   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "783121              True               Mac              0        1   \n",
       "474873              True          Mac;Unix              2        2   \n",
       "481858              True               Mac              1        2   \n",
       "318480              True               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  score_bash  \n",
       "861167                 1           0             1         NaN  \n",
       "346538                 2           0             2         NaN  \n",
       "473138                 3           0             2         NaN  \n",
       "268370                 0           0             1         NaN  \n",
       "21801                  0           0             1         NaN  \n",
       "...                  ...         ...           ...         ...  \n",
       "189753                 3           0             1         NaN  \n",
       "783121                 0           0             0         NaN  \n",
       "474873                 1           0             1         NaN  \n",
       "481858                 0           0             1         NaN  \n",
       "318480                 0           0             0         NaN  \n",
       "\n",
       "[31 rows x 8 columns]"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "      <th>score_bash</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has  computer_os  score_command  score_r  score_matlab  \\\n",
       "861167             False        False          False    False         False   \n",
       "346538             False        False          False    False         False   \n",
       "473138             False        False          False    False         False   \n",
       "268370             False        False          False    False         False   \n",
       "21801              False        False          False    False         False   \n",
       "...                  ...          ...            ...      ...           ...   \n",
       "189753             False        False          False    False         False   \n",
       "783121             False        False          False    False         False   \n",
       "474873             False        False          False    False         False   \n",
       "481858             False        False          False    False         False   \n",
       "318480             False        False          False    False         False   \n",
       "\n",
       "subject_id  score_perl  score_python  score_bash  \n",
       "861167           False         False        True  \n",
       "346538           False         False        True  \n",
       "473138           False         False        True  \n",
       "268370           False         False        True  \n",
       "21801            False         False        True  \n",
       "...                ...           ...         ...  \n",
       "189753           False         False        True  \n",
       "783121           False         False        True  \n",
       "474873           False         False        True  \n",
       "481858           False         False        True  \n",
       "318480           False         False        True  \n",
       "\n",
       "[31 rows x 8 columns]"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.fillna(0, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subject_id</th>\n",
       "      <th>computer_has</th>\n",
       "      <th>computer_os</th>\n",
       "      <th>score_command</th>\n",
       "      <th>score_r</th>\n",
       "      <th>score_matlab</th>\n",
       "      <th>score_perl</th>\n",
       "      <th>score_python</th>\n",
       "      <th>score_bash</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>861167</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346538</th>\n",
       "      <td>True</td>\n",
       "      <td>Windows;Unix</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>473138</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268370</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21801</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189753</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Windows;Unix</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>783121</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>474873</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac;Unix</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>481858</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>318480</th>\n",
       "      <td>True</td>\n",
       "      <td>Mac</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "subject_id  computer_has       computer_os  score_command  score_r  \\\n",
       "861167              True               Mac              1        1   \n",
       "346538              True      Windows;Unix              1        1   \n",
       "473138              True               Mac              2        0   \n",
       "268370              True               Mac              1        1   \n",
       "21801               True               Mac              2        1   \n",
       "...                  ...               ...            ...      ...   \n",
       "189753              True  Mac;Windows;Unix              3        0   \n",
       "783121              True               Mac              0        1   \n",
       "474873              True          Mac;Unix              2        2   \n",
       "481858              True               Mac              1        2   \n",
       "318480              True               Mac              1        1   \n",
       "\n",
       "subject_id  score_matlab  score_perl  score_python  score_bash  \n",
       "861167                 1           0             1         0.0  \n",
       "346538                 2           0             2         0.0  \n",
       "473138                 3           0             2         0.0  \n",
       "268370                 0           0             1         0.0  \n",
       "21801                  0           0             1         0.0  \n",
       "...                  ...         ...           ...         ...  \n",
       "189753                 3           0             1         0.0  \n",
       "783121                 0           0             0         0.0  \n",
       "474873                 1           0             1         0.0  \n",
       "481858                 0           0             1         0.0  \n",
       "318480                 0           0             0         0.0  \n",
       "\n",
       "[31 rows x 8 columns]"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
